-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New partitioner: KD-Tree partitioner #127
base: master
Are you sure you want to change the base?
Conversation
…rtitions are received
to continue my work
Hi @abualia4, thanks for the PR. Sorry it took me a long time to review it, but I will add my comments now. Please read them, and implement change if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abualia4. Could you please have a look at the comments and implement the suggestions for change in order to merge the PR.
fitsfn= | ||
MASTERURL=spark://134.158.75.222:7077 | ||
fitsfn=hdfs://134.158.75.222:8020//lsst/LSST1Y/out_srcs_s1_0.fits | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general rule is to not commit URL in a repo. Please keep the previous empty variables:
MASTERURL=
fitsfn=
target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \ | ||
$inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "username" "api_key" | ||
$inputfn "position_x_mock,position_y_mock,position_z_mock" false "kdtree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key?
# --jars ${SP},${HP} --packages ${PACK} \ | ||
# --class com.astrolabsoftware.spark3d.examples.VisualizePart \ | ||
# target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \ | ||
# $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the general rule is to never push sensitive information in a repo! Could you remove your username and api key?
@@ -0,0 +1,141 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give a meaningful name for this file? test.scala
is too vague, and something like TestKDTree.scala
would be better.
var minX: Double, var maxX: Double, | ||
var minY: Double, var maxY: Double, | ||
var minZ: Double, var maxZ: Double) | ||
extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this class? it is not used elsewhere, so I would recommend to remove this file from the repo.
@@ -84,7 +81,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat | |||
* @return list of Tuple of containing partitions and their index/partition ID's | |||
*/ | |||
override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = { | |||
|
|||
println("getPartitionNodes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary println
@@ -22,7 +22,7 @@ import scala.collection.mutable.ListBuffer | |||
|
|||
class OctreePartitioning (private val octree: Octree) | |||
extends Serializable { | |||
|
|||
println("OctreePartitioning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary println
@@ -40,7 +40,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Point3D | |||
* | |||
*/ | |||
class OnionPartitioner(grids : List[ShellEnvelope]) extends SpatialPartitioner(grids) { | |||
|
|||
println("OnionPartitioner") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary println
@@ -31,7 +31,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Shape3D._ | |||
* | |||
*/ | |||
class OnionPartitioning extends Serializable { | |||
|
|||
println ("OnionPartitionong") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary println
@@ -0,0 +1,81 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, remove this unnecessary file from the repo.
else | ||
println("Please determine the number of partitions as 1, 2, 4, 8 and so on") | ||
} | ||
else For others |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think else For others
is a correct Scala statement... You probably miss a #
before For
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, this is why the continuous integration is failing.
I'll also take a tomorrow. Seems awesome at the first glance 👍
…On Thu, Mar 19, 2020, 6:15 PM Julien ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Thanks @abualia4 <https://github.com/abualia4>. Could you please have a
look at the comments and implement the suggestions for change in order to
merge the PR.
------------------------------
In run_part_scala.sh
<#127 (comment)>
:
> @@ -10,8 +10,9 @@ PACK=com.github.astrolabsoftware:spark-fits_2.11:0.7.1
SF=target/scala-2.11/spark3d_2.11-${VERSION}.jar
HP=lib/jhealpix.jar
-MASTERURL=
-fitsfn=
+MASTERURL=spark://134.158.75.222:7077
+fitsfn=hdfs://134.158.75.222:8020//lsst/LSST1Y/out_srcs_s1_0.fits
+
The general rule is to not commit URL in a repo. Please keep the previous
empty variables:
MASTERURL=
fitsfn=
------------------------------
In run_viz_scala.sh
<#127 (comment)>
:
> target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
- $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "username" "api_key"
+ $inputfn "position_x_mock,position_y_mock,position_z_mock" false "kdtree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw"
Again, the general rule is to never push sensitive information in a repo!
Could you remove your username and api key?
------------------------------
In run_viz_scala.sh
<#127 (comment)>
:
>
-spark-submit \
- --master ${MASTERURL} \
- --driver-memory 4g --executor-memory 28g --executor-cores 17 --total-executor-cores 102 \
- --jars ${SF},${HP} --packages ${PACK} \
- --class com.astrolabsoftware.spark3d.examples.VisualizePart \
- target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
- $inputfn "Z_COSMO,RA,DEC" true "onion" 8 0.0001 "username" "api_key"
+ #inputfn=hdfs://134.158.75.222:8020/user/julien.peloton/dc2
+ #spark-submit \
+ # --master ${MASTERURL} \
+ # --driver-memory 4g --executor-memory 28g --executor-cores 17 --total-executor-cores 102 \
+ # --jars ${SP},${HP} --packages ${PACK} \
+ # --class com.astrolabsoftware.spark3d.examples.VisualizePart \
+ # target/scala-${SCAlA_VERSION}/spark3d_${SCAlA_VERSION}-${VERSION}.jar \
+ # $inputfn "position_x_mock,position_y_mock,position_z_mock" false "octree" 512 0.001 "abualia4" "GFz0cUDumcKcnhpMd8qw"
Again, the general rule is to never push sensitive information in a repo!
Could you remove your username and api key?
------------------------------
In src/main/scala/com/spark3d/examples/test.scala
<#127 (comment)>
:
> @@ -0,0 +1,141 @@
+/*
Could you give a meaningful name for this file? test.scala is too vague,
and something like TestKDTree.scala would be better.
------------------------------
In src/main/scala/com/spark3d/geometryObjects/BoundaryKD.scala
<#127 (comment)>
:
> + *
+ * Default constructor is kept private to avoid creating an instance of the cube Envelope class without initialising
+ * the min/max coordinates along axes incorrectly.
+ *
+ * @param minX minimum coordinate of cube Envelope along X-axis
+ * @param maxX maximum coordinate of cube Envelope along X-axis
+ * @param minY minimum coordinate of cube Envelope along Y-axis
+ * @param maxY maximum coordinate of cube Envelope along Y-axis
+ * @param minZ minimum coordinate of cube Envelope along Z-axis
+ * @param maxZ maximum coordinate of cube Envelope along Z-axis
+ */
+class BoundaryKD private(
+ var minX: Double, var maxX: Double,
+ var minY: Double, var maxY: Double,
+ var minZ: Double, var maxZ: Double)
+ extends Serializable {
what is the purpose of this class? it is not used elsewhere, so I would
recommend to remove this file from the repo.
------------------------------
In src/main/scala/com/spark3d/geometryObjects/BoxEnvelope.scala
<#127 (comment)>
:
> @@ -11,7 +11,7 @@
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
- * limitations under the License.
+ * limitations under the License.env
Please, remove the .env added mistakenly.
------------------------------
In src/main/scala/com/spark3d/geometryObjects/BoxEnvelope.scala
<#127 (comment)>
:
> @@ -624,23 +627,49 @@ class BoxEnvelope private(
* being checked for containment
* @param z the z-coordinate of the point for which this cube Envelope is
* being checked for containment
- * @return true if (x, y, z) lies in the interior or
- * on the boundary of this cube Envelope, false if the cube Envelope is null.
+ * @return true if (x, y, z) lies in the interior of this cube Envelope, false if the cube Envelope is null.
Typo - you removed a part of the documentation
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala
<#127 (comment)>
:
> +
+
+ /**
+ * Get the number of partitions in this partitioning
+ *
+ * @return the number of partitions
+ */
+ override def numPartitions: Int = {
+ grids.size
+
+ }
+
+ /**
+ *
+ *
+ */
Documentation missing
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala
<#127 (comment)>
:
> +
+ partitionId=0
+ for(element<-grids){
+ if(element.coversKD(c0,c1,c2)){
+ return partitionId
+ }
+ partitionId+=1
+ }
+
+ 0
+
+ }
+
+ /**
+ *
+ */
Documentation missing
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala
<#127 (comment)>
:
> + }
+
+ 0
+
+ }
+
+ /**
+ *
+ */
+ override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = {
+ null
+ }
+
+ /**
+ *
+ */
Documentation missing
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/KDtreePartitioner.scala
<#127 (comment)>
:
> + *
+ */
+ override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = {
+ null
+ }
+
+ /**
+ *
+ */
+ override def getNeighborNodes[T <: Shape3D](spatialObject: T, inclusive: Boolean = false): List[Tuple2[Int, Shape3D]] = {
+ null
+ }
+
+ /**
+ *
+ */
Documentation missing
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala
<#127 (comment)>
:
> @@ -44,8 +43,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat
*
*/
override def placeObject[T <: Shape3D](spatialObject: T): Iterator[Tuple2[Int, T]] = {
-
- val result = HashSet.empty[Tuple2[Int, T]]
+ val result = HashSet.empty[Tuple2[Int, T]]
Incorrect indentation
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala
<#127 (comment)>
:
> @@ -66,8 +64,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat
*
*/
override def placePoints(c0: Double, c1: Double, c2: Double, isSpherical: Boolean) : Int = {
-
- val spatialObject = new Point3D(c0, c1, c2, isSpherical)
+ val spatialObject = new Point3D(c0, c1, c2, isSpherical)
Incorrect indentation
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioner.scala
<#127 (comment)>
:
> @@ -84,7 +81,7 @@ class OctreePartitioner (octree: Octree, grids : List[BoxEnvelope]) extends Spat
* @return list of Tuple of containing partitions and their index/partition ID's
*/
override def getPartitionNodes[T <: Shape3D](spatialObject: T): List[Tuple2[Int, Shape3D]] = {
-
+ println("getPartitionNodes")
Remove unnecessary println
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/OctreePartitioning.scala
<#127 (comment)>
:
> @@ -22,7 +22,7 @@ import scala.collection.mutable.ListBuffer
class OctreePartitioning (private val octree: Octree)
extends Serializable {
-
+ println("OctreePartitioning")
Remove unnecessary println
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/OnionPartitioner.scala
<#127 (comment)>
:
> @@ -40,7 +40,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Point3D
*
*/
class OnionPartitioner(grids : List[ShellEnvelope]) extends SpatialPartitioner(grids) {
-
+ println("OnionPartitioner")
Remove unnecessary println
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/OnionPartitioning.scala
<#127 (comment)>
:
> @@ -31,7 +31,7 @@ import com.astrolabsoftware.spark3d.geometryObjects.Shape3D._
*
*/
class OnionPartitioning extends Serializable {
-
+println ("OnionPartitionong")
Remove unnecessary println
------------------------------
In src/main/scala/com/spark3d/spatialPartitioning/learn.scala
<#127 (comment)>
:
> @@ -0,0 +1,81 @@
+/*
Please, remove this unnecessary file from the repo.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#127 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADBQ6XWPSNHLNUP3DJJVWADRIIHXJANCNFSM4IKJ3HUQ>
.
|
Thanks @mayurdb ! I hope you are in a safe place. |
General
New partitioning technique based on balanced KD-tree was developed.
coverKD
method is added toBoxEnvelope.scala
file, (spark3D/src/main/scala/com/spark3d/geometryObjects/
).Also I made minor modifications as follows:
/home/alia/spark3D/src/main/scala/com/spark3d/Partitioners.scala
)./home/alia/spark3D/src/main/scala/com/spark3d/Partitioners.scala
)./home/alia/spark3D/src/main/scala/com/spark3d/utils/GridType.scala
).spark3D/
).New added code
The main files for the new Partitioner (KD-tree) are as follows:
spark3D/src/main/scala/com/spark3d/spatialPartitioning/
).spark3D/src/main/scala/com/spark3d/spatialPartitioning/
).spark3D/src/main/scala/com/spark3d/spatialPartitioning/
).spark3D/src/main/scala/com/spark3d/examples/
)spark3D/
).Evaluation of the new code
In order to evaluate the performance of the new spatial partitioning technique which is based on balanced KD-tree, this methodology took empirical approach by comparing it with onion and octree partitioners which were implemented in the spark3D framework, after applying them on two astronomical data sets. These data sets have been generated by CoLoRe fast simulation. The first dataset has more than 37 million 3D points, and data has been randomly drawn along spherical coordinates. The second dataset contains more than seven and a half millions of 3D points, and data has been randomly drawn along Cartesian axes. All implementations were run on the LAL Apache Spark cluster hosted in the virtual data cloud.